Audio enhanced electronic insertion of indicia into video

ABSTRACT

A system and method ( 40 ) of altering the audio portion of a live television broadcast signal substantially in real time. The system is used to enhance the effects of live video insertion systems. The broadcast signal is received and separated into a pattern recognition unit ( 72 ) in order to recognize predetermined events. The broadcast audio is then altered based on the occurrence of said events. Alterations ( 68, 70 ) include modifications to attributes such as volume, tone, pitch, synchronization, echo, reverberation, and frequency profile. Once altered, the audio is re-synchronized ( 80 ) with the video channel which has undergone its own modification.

CROSS-REFERENCE TO RELATED APPLICATIONS

This patent application claims the priority and benefit of U.S.Provisional Patent Application No. 60/016,419 filed on Apr. 29, 1996entitled “Audio Enhanced Electronic Insertion of Indicia into Video”.

BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention relates to the improved performance of devices forinserting realistic indicia into video sequences, and particularly, tothe enhancement by the addition of audio effects related to theinsertions.

2. Description of the Related Art

Electronic devices for inserting electronic images into live videosignals, such as described in U.S. Pat. No. 5,264,933 to Rosser, et al.and U.S. Pat. No. 5,491,517 to Kreitman et al., have been developed andused commercially for the purpose of inserting advertising and otherindicia into video sequences, including live broadcasts of sportingevents. To varying degrees of success, these devices seamlessly andrealistically incorporate indicia into the original video in real time.Realism is maintained even as the original scene is zoomed, panned, orotherwise altered in size or perspective.

U.S. Pat. No. 5,264,933 to Rosser, et al. discusses having the videoinsert respond to sound in the event, i.e. having the video insert pulseor change color in response to a rise in crowd noise. It does not,however, disclose the reverse possibility of adding a sound effect tothe audio to coincide with a video insert, i.e. adding a beat to theprogram sound to coincide with the pulsing of the insertion, or alteringthe program audio in response to audio or visual cues in the program orin response to some operator command.

Other patents concerning video insertion technology, such as U.S. Pat.No. 5,491,517 to Kreitman et al., U.S. Pat. No. 5,353,392 to Luquet etal., or U.S. Pat. No. 5,488,675 to Hanna or PCT applicationsPCT/US94/01679 and PCT/US94/11527 of Sharir and Tamir confine themselvesstrictly to the video portion of a broadcast. None of the aforementionedpatents or applications disclose methods for making an inserted indicialook more realistic by adding synchronized audio effects.

SUMMARY OF THE INVENTION

The invention comprises both a method and an apparatus to enhance realtime insertion of indicia into video by altering the audio portion of abroadcast as well as the video portion of the broadcast. The inventionapplies equally well to real time insertion of video indiciaaccomplished by means of pattern recognition technology; by means ofcamera motion sensors attached to the cameras recording the event; or bya combination of pattern recognition and camera motion sensors.

In the present invention each still or animated video sequence intendedfor insertion into the live video has an associated audio sequence. Whenthe still image, animated image sequence, or video sequence is inserted,the associated sound sequence is also activated. Sound activation may betriggered by the start of the insertion; some action in the videoportion of the insertion; some action either in the video or audiochannel of the broadcast; by some combination of action in the audio andvideo channel, or partially or wholly, by an operator. In addition totriggering, the playing, volume, modulation, termination, or any otherattribute of the associated sound sequence, may be influenced by theinserted image, animation or video, audio or video channel of the event,some combination of the audio and video channel of the event or partlyor wholly by an operator.

The associated audio sequence is stored either digitally in systemmemory in the same manner as the video sequences are stored, orseparately on either an analogue or digital medium.

A live video insertion system is enhanced so that, in addition tochannels for program video and video insertion, an enhanced audioprocessor is added within an audio channel. In a standard live videoinsertion system the audio channel is merely a delay line allowing theprogram audio to be delayed during video processing. The enhanced audioprocessor interacts with the pattern recognition and tracking part ofthe live video insertion system (LVIS™). If the audio mixing is donedigitally, there is also means to convert the program audio fromanalogue to digital and back to analogue after the mixing is done.

The enhanced audio processor also includes means for audio patternrecognition for adding an audio sequence to the broadcast audio, orotherwise alter the broadcast audio. Audio pattern recognition can beused alone or in conjunction with commands from the video patternrecognition and tracking module of the LVIS™. It can also be used inconjunction with operator commands.

Making an inserted indicia appear as if it is actually part of theoriginal video scene is an important aspect of the technology.Appropriate audio cues can considerably enhance the visual illusion thatthe inserted video indicia is part of the original scene. Audioenhancement of the illusion is particularly effective if the insertedindicia is an animated sequence and the added audio is timed to coincidewith specific actions of the animation. For example, an inserted videoindicia can be programmed to pulse on and off. To enhance this illusion,a sound effect can increase or decrease in volume in sync with thepulsing video insertion. Other examples include changes in pitch, tone,reverberation, added echo, spoken dialogue, or musical jingles of anaudio insert that are synchronized with changes in the inserted video.

Alteration of the original program sound rather than addition of aseparate audio insert can be done as well. For instance, crowd noisecould be artificially modulated to coincide with a change in theinserted logo. Consider an animated version of a team mascot. As theartificially inserted team mascot raises and lowers its arms, the crowdvolume could increase or decrease accordingly adding to the illusionthat the mascot was actually in the stadium.

Further enhancements include synchronizing an audio addition or theactual broadcast audio with an audio or visual cue occurring in theaction of the event being broadcast. In a baseball game, for instance, acartoon character insertion can react to an audio event such as thecrack of the bat with a suitable phrase in the distinctive voice of thecharacter. Or, the reaction could be synchronized to a visual cue suchas swinging the bat. A combination of visual and audio cues may bepartially or entirely operator activated and synchronized to an eventlike the start or end of an inning. Additionally, if the insert in abaseball game appeared to be hit by the ball, a suitable sound appearingto come from the injured insert could be added to the program audio.Utilizing known speech recognition techniques, the audio cue could be acommand, a well known phrase, or team name.

Added sound can follow the movement of a video insert. For instance, thevolume associated with the insert could increase as the camera zooms inand the insertion grows in size. For stereo broadcasts, the ratio of theleft and right channels can be altered as the insert panned off to theside such that the sound seems to follow the insert.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a schematic plan of a live video insertion systemmodified to include an enhanced audio processor.

FIG. 2 is a flow diagram showing the flow of data through the system aseach field of video is processed.

FIG. 3 illustrates a more detailed schematic drawing of the enhancedaudio processor.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

During the course of this description like numbers will be used toidentify like elements according to the different figures whichillustrate the invention.

A typical live video insertion system (LVIS™) is described in detail inseveral related applications: U.S. Provisional Patent Application No.60/000,279 filed on Jun. 16, 1995 entitled “APPARATUS AND METHOD OFINSERTING IMAGES INTO TELEVISION DISPLAYS”; U.S. patent applicationfiled Nov. 28, 1995 entitled “SYSTEM AND METHOD FOR INSERTING STATIC ANDDYNAMIC IMAGES INTO A LIVE VIDEO BROADCAST”; U.S. patent application No.08/381,088 filed Jan. 31, 1995 entitled “LIVE VIDEO INSERTION SYSTEM”;and U.S. patent application No. 08/580,892 filed Dec. 12, 1995 entitled“METHOD OF TRACKING SCENE MOTION FOR LIVE VIDEO INSERTION SYSTEMS”, theteachings of which are hereby included by reference.

In a typical LVIS™ 40, as shown schematically in FIG. 1, a video signalfrom a camera 32 recording an event is brought into aSearch/Tracking/Verification module 42. The Search/Tracking/Verificationmodule 42 is equivalent to the camera parameter extraction module inpending PCT applications PCT/US94/01679 and PCT/US94/11527 of Sharir andTamir. Search/Tracking/Verification module 42 uses pattern recognition,information from sensors attached to the camera, the tally signal from abroadcast switcher, or some combination of these three sources ofinformation, to determine which camera is viewing the scene. Module 42then calculates the orientation and field of view of the cameraexpressing them as model warp parameters 20 (FIG. 2) relating thecurrent camera view to a reference view. The warp parameters 20 arederived from pattern recognition and are expressed as an affinetransformation with respect to a reference view from that camera.However, the warp parameters may be any suitable mathematical transform,including, but not restricted to, models such as a full perspectivetransform.

Referring to FIG. 2, the warp parameters 20 are used in conjunction witha synthetic reference image 22 in occlusion processor 44 to generate akey 32 which indicates which part of the live video 28 where a logo 26is going to be inserted has objects the logo 26 should not obscure.

A major enhancement over conventional blue screen occlusion processingtechnology is that the occlusion of the present system can be performedon textured surfaces. Insertion processor 46 (FIG. 1) takes key 32 and alogo image 26 and places logo image 26 into the live video 28 so thatlogo image 26 looks as if it is part of the original scene.

Logo image 26 may be another source of video, including an animated clipfrom a video storage device 36. The video storage device 36 is a digitaltape recorder or a digital video disk or other suitable storage medium.Video storage device 36 is controlled by information from thesearch/track/verify module 42 or by a system operator so that theappropriate animation is selected and set in motion at the appropriatetime to match action in the live video broadcast. For instance, in abaseball game an animation sequence could be a sponsors logo morphinginto a team mascot just after the batter has attempted to hit the ball.Different animation sequences can be selected by an operator dependingon whether the swing attempt was successful or if the pitch was a strikeor a ball.

In the present invention, a typical live video insertion system (LVIS™)40 is modified by the addition of an enhanced audio processor 60, asshown schematically in FIG. 1. Enhanced audio processor 60 is amicro-processor that interprets and responds to input from imagerecognition and tracking module 42 of LVIS™ system 40. An audiocoordinator 62 (FIG. 3) is programmed for interpreting and responding toinput from video pattern recognition unit 64 which is part of theSearch/Track/Verify unit 42.

Enhanced audio processor unit 60 further responds to direct operatorcontrol as the audio coordinator 62 also interprets and responds tooperator input unit 66 which forms a part of the user interface.

Additionally, enhanced audio processor 60 synchronously adds or mixes anassociated audio insert with the broadcast audio utilizing any of thecontrol signals. This includes signals from its own pattern recognitionmodule since audio coordinator 62 and audio mixer unit 68 areprogrammable microprocessors. Audio mixer unit 66 may be a commercialunit such as the WhirlWind Inc., of Rochester, N.Y. “MIX-44”, which is afully programmable, computer controllable audio mixing machine.

Enhanced audio processor 60 can also modify the broadcast audio volume,tone, pitch and can create echoes, reverberations and other similaraudio effects. Audio effects unit 70 can be an off the shelf commercialunit such as the Applied Research Technology Inc., of Rochester, N.Y.“Effects Network”, which is a fully programmable, computer controllableaudio multi-effects machine.

Enhanced audio processor unit 60 also has means for audio patternrecognition of sounds in the broadcast audio including voicerecognition. Audio pattern recognition unit 72 is a programmablemicro-processor using one or more of the well known audio patternrecognition algorithms discussed, for instance, in U.S. Pat. No.4,802,231 to Davis, or U.S. Pat. No. 4,713,778 to Baker.

In alternative embodiments, simplified versions of the enhanced audioprocessor 60 may have any subset of these key characteristics.

An innovation of the present invention includes the addition of an audiostorage device 38 which stores sound effects related to the video insertanimations stored in video storage unit 36. Enhanced audio processor 60is no longer just a delay pipeline as in standard LVIS™ systems. Theheart of enhanced audio processor 60 is audio coordinator unit 62. Audiocoordinator unit 62 uses tracking or other computer generatedinformation, operator input, program generated parameters, or somecombination thereof, to mix an audio clip from audio storage device 38with broadcast audio 16. Enhanced audio processor 60 is able to affectall necessary attributes of both the broadcast audio relayed through thesystem and an associated audio clip mixed into the broadcast audio bymeans of audio effects unit 70. Said attributes include, but are notlimited to, volume, tone, echo, distortion, fade, reverberation, andfrequency profile. In addition, audio coordinator 62 is able to affectthe start, end, play speed, synchronization, and other such attributesof the associated audio clip. All audio manipulations are a synchronizedfunction of input from the computer, from other suitable external clocksor triggers, from an operator, or from any combination thereof.

Enhanced audio processor 60 also incorporates an audio patternrecognition unit 72 which has signal processing capabilities like thosedisclosed in U.S. Pat. No. 4,802,231 to Davis, or U.S. Pat. No.4,713,778 to Baker. Enhanced audio processor 60 can recognize simplespeech and other distinct audio signals, monitor their levels and otherattributes, and use their characteristics to control or modify theassociated audio clip mixed into the broadcast audio. Said attributesinclude, but are not limited to, start, end, play speed,synchronization, volume, tone, pitch, echo, reverberation and frequencyprofile. Audio coordinator 62 can also use recognized audio patterns tomodify certain characteristics of the broadcast audio such as volume,tone, pitch, echo, reverberation and frequency profile.

In the preferred embodiment of the present invention,Search/Track/Verify module 42 is enhanced so that in addition to beingable to recognize and track objects, landmarks, and texture for thepurpose of seamlessly inserting indicia in the overall scene, it usesthe same techniques to recognize and/or track the motion of eventsoccurring within the scene. Such events include, but are not limited to,the swinging of a baseball bat or the trajectory of a tennis ball. Thesearch/track/verify module 42 feeds this information to audiocoordinator 62 for the purpose of controlling or modifying either orboth of the associated audio and broadcast audio in the manner discussedabove.

Audio coordinator 62 can also adjust the audio associated with theinsertions and the broadcast audio via direct operator commands. This isaccomplished by operator unit 66 which is part of the LVIS™ userinterface. Audio coordinator 62 can also act in response to acombination of commands from the operator, the visual image recognitionand tracking sections, and the audio signal recognition and monitoringsections, and use those combinations, which may include one or moredependent occurrences over time, to modify, synchronize or otherwiseadjust attributes of both the associated audio and the broadcast audio.The modifications include, but are not limited to, changes in volume,tone, pitch, synchronization, echo, reverberation, and frequency profileof the broadcast audio, and start, end, play speed, volume, tone pitch,synchronization, echo, reverberation, and frequency profile of theassociated insert audio.

A schematic representation of the preferred embodiment of the enhancedaudio processor 60 is illustrated in FIG. 3. The broadcast audio isfirst digitized using an audio analogue to digital convertor 74. Thedigitized program audio is stored in program audio store 76 whichcorresponds to audio delay units 16 in the conventional LVIS™ audio path(FIG. 2). The audio signals then pass through audio pattern recognitionunit 72, which, under control of audio coordinator 62, is capable ofrecognizing audio patterns, including speech. Recognition of patterns orspeech by audio pattern recognition unit 72 is used by audio coordinator62 to control the type and timing of adjustments to the broadcast audioand the associated audio by means of audio mixer 68 and audio effectsunit 70.

Audio coordinator 62 also receives information from video patternrecognition unit 64, field synchronizer 76, operator input 66, and fromthe external clocks and triggers interface unit 78 for controlling thetype and timing of adjustments to the broadcast audio and the associatedinsert audio by means of audio mixer 68 and audio effects unit 70. Theaudio sequence to be added to the program audio is stored in theassociated audio store 84 which is also under control of audiocoordinator 62. Audio coordinator 62 determines what is transferred toaudio-mixer 68 and when said transfer occurs. The resultant mixedprogram audio passes through audio effects unit 70 where furtheradjustments to attributes like volume, tone, pitch, echo, reverberationand frequency profile are made under the control of audio coordinator62.

The resultant audio is then stored in a multi-field program audio store80 for the appropriate amount of time (a few video fields) tosynchronize it with the video image before being converted back toanalogue form using an audio digital to analogue convertor 82. Theanalogue audio output is then incorporated into the video signal to forma standard broadcast signal such as NTSC or PAL and broadcast.

Although the preferred embodiment described has the audio mixed in thedigital domain, the entire audio operation could be done in the analoguedomain using appropriate equipment well known in the art.

The preferred embodiment as shown in FIG. 2 describes audio mixer 68 asbeing used in the fifth field of the overall LVIS™ cycle and audioeffects generator 70 used in the sixth field. Both audio mixer 68 andaudio effects generator 70, however, could be used anywhere in theprocessing cycle as long as appropriate offsets were used between thevideo field stored in video insertion store 36 and the audio fieldstored in associated audio store 38. In particular, both audio mixer 68and audio effects generator 70 can be used in the last field ofprocessing, coincident with the combination of the logo, final key andvideo to form video output 30. This would have the advantage of onlyrequiring a single multi-field program audio store 80 as opposed to thelayout of the enhanced audio processor 60 shown in FIG. 3 which requirestwo such devices.

It is to be understood that the apparatus and method of operation taughtherein are illustrative of the invention. Modifications may readily bedevised by those skilled in the art without departing from the spirit orscope of the invention.

What is claimed is:
 1. A method of altering the audio portion of a livetelevision broadcast signal substantially in real time, said methodcomprising the steps of: (a) receiving 18 said live television broadcastsignal; (b) separating the video and audio portions of said livetelevision broadcast signal into separate channels; (c) delaying 28 thevideo portion of said live television broadcast signal; (d) recognizing72 an event within the audio or video portion of said live televisionbroadcast signal; (e) altering 68 70 the audio portion of said livetelevision broadcast signal based upon said event; (f) re-synchronizing80 the audio and video portions of said live television broadcastsignal; and (g) outputting 50 the audio altered live televisionbroadcast signal.
 2. The method of claim 1 wherein said altering 68 step(e) optionally includes mixing 68 a second audio channel containing apredetermined audio clip 84 into said live television broadcast audiochannel.
 3. The method of claim 2 wherein said audio channels includeaudio attributes such as volume, tone, pitch, synchronization, echo,reverberation, and frequency profile.
 4. The method of claim 3 whereinsaid altering 70 step (e) modifies at least one of said audioattributes.
 5. The method of claim 4 wherein said recognizing 72 step(d) is based on audio pattern recognition of said event.
 6. The methodof claim 4 wherein said recognizing step (d) is based on video patternrecognition 64 of said event.
 7. The method of claim 4 wherein saidrecognizing step (d) is based on direct operator input
 66. 8. A methodof altering the audio portion of a live television broadcast signalsubstantially in real time, said method comprising the steps of: (a)receiving 18 said live television broadcast signal; (b) converting 74said live television broadcast signal from the analog domain to thedigital domain; (c) separating the video and audio portions of said livetelevision broadcast signal into separate channels; (d) delaying 28 thevideo portion of said live television broadcast signal; (e) recognizingan event within the audio 72 or video 64 portion of said live televisionbroadcast signal; (f) altering 68 70 the audio portion of said livetelevision broadcast signal based upon said event; (g) re-synchronizing80 the audio and video portions of said live television broadcastsignal; (h) re-converting 82 said live television broadcast signal backto the analog domain; and (i) outputting 50 the audio altered livetelevision broadcast signal.
 9. The method of claim 8 wherein saidaltering step (f) optionally includes mixing 68 a second audio channelcontaining a predetermined audio clip 84 into said live televisionbroadcast audio channel.
 10. The method of claim 9 wherein said audiochannels include audio attributes 70 such as volume, tone, pitch,synchronization, echo, reverberation, and frequency profile.
 11. Themethod of claim 10 wherein said altering 70 step (f) modifies at leastone of said audio attributes.
 12. The method of claim 11 wherein saidrecognizing step (e) is based on audio pattern recognition 72 of saidevent.
 13. The method of claim 11 wherein said recognizing step (e) isbased on video pattern recognition 64 of said event.
 14. The method ofclaim 11 wherein said recognizing step (e) is based on direct operatorinput
 66. 15. A method of altering the audio and video portion of a livetelevision broadcast signal substantially in real time, said methodcomprising the steps of: (a) receiving 18 said live television broadcastsignal; (b) converting 74 said live television broadcast signal from theanalog domain to the digital domain; (c) separating the video and audioportions of said live television broadcast signal into separatechannels; (d) delaying 28 the video portion of said live televisionbroadcast signal; (e) recognizing at least one event within the audio 72or video 64 portion of said live television broadcast signal; (f)altering 68 70 the audio portion of said live television broadcastsignal based upon said at least one event; (g) altering the videoportion of said live television broadcast signal based upon the same ora different one of said at least one event; (h) re-synchronizing 80 theaudio and video portions of said live television broadcast signal; (i)re-converting 82 said live television broadcast signal back to theanalog domain; and (j) outputting 50 the audio altered live televisionbroadcast signal.
 16. The method of claim 15 wherein said altering step(f) optionally includes mixing 68 a second audio channel containing apredetermined audio clip 84 into aid live television broadcast audiochannel.
 17. The method of claim 16 wherein said audio channels includeaudio attributes such as volume, tone, pitch, synchronization, echo,reverberation, and frequency profile.
 18. The method of claim 17 whereinsaid altering 70 step (f) modifies at least one of said audioattributes.
 19. The method of claim 18 wherein said recognizing step (e)is based on audio pattern recognition 72 of said event.
 20. The methodof claim 18 wherein said recognizing step (e) is based on video patternrecognition 64 of said event.
 21. The method of claim 18 wherein saidrecognizing step (e) is based on direct operator input
 66. 22. A systemfor altering the audio portion of a live television broadcast signalsubstantially in real time comprising: separation means for separatingthe audio and video portions of said live television broadcast signalinto separate channels which are independently manipulatable; audiopattern recognition means 72 for recognizing an event within the audioportion of said live television broadcast signal; audio processor means60 for receiving said audio portion of said live television broadcastsignal and altering same based upon said event; and, re-synchronizationmeans for re-synchronizing the audio and video portions of said livetelevision broadcast signal after said live television broadcast signalaudio portion has been altered.
 23. The system of claim 22 wherein saidaudio processor means 60 further comprises: first audio storage means 80for storing and delaying said live television broadcast signal audioportion; second audio storage means 84 for storing an insertable audioclip; audio coordinator means 62 for receiving information regardingsaid event and controlling the type and timing of the altering of saidlive television broadcast signal audio portion; audio mixer means 68 forselectively mixing, as controlled by said audio coordinator means, saidinsertable audio clip and said live television broadcast signal audioportion; and audio effects means 70 for selectively modifying attributesof said live television broadcast signal audio portion and said mixedlive television broadcast signal audio portion, as controlled by saidaudio coordinator means.
 24. The system of claim 23 wherein said eventinformation further includes signals received into said audiocoordinator means 62 from sources outside of said audio processor means60.
 25. The system of claim 24 wherein said outside sources compriseoperator input means 66, video pattern recognition means 64, fieldsynchronization means 76, external trigger means 78, external clockmeans 78, or any combination thereof.
 26. The system of claim 25 whereinsaid attributes include volume, tone, pitch, synchronization, echo,reverberation, and frequency profile.